-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-2417] Add support allowDuplicateInserts in HoodieJavaClient #3644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0e90c98 to
7ba5582
Compare
|
LGTM! |
vinothchandar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments on the tests, but given the other PR is delayed. Might be okay to land this and reconcile. but still this is just duplicating code from the spark action executor
| records2.add(new HoodieRecord(new HoodieKey(insertRow1.getRowKey(), insertRow1.getPartitionPath()), insertRow1)); | ||
| records2.add(new HoodieRecord(new HoodieKey(insertRow2.getRowKey(), insertRow2.getPartitionPath()), insertRow2)); | ||
|
|
||
| Thread.sleep(1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please avoid relying on sleep in tests. You can generate the commit times in the test, for greater control? Also could we move this test as a separate one for HoodieConcatHandle?
| } | ||
|
|
||
| @Test | ||
| public void testHoodieConcatHandle() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondeing if this can be rewritten more nicely using the TestTable abstraction. cc @xushiyan .
4feee2b to
dc76423
Compare
|
@vinothchandar Hello, I have added comments, removed sleep and rebase with latest master, and put the test cases in separate class: |
| int index = 0; | ||
| for (GenericRecord record : fileRecords) { | ||
| assertEquals(records1.get(index).getRecordKey(), record.get("_row_key").toString()); | ||
| if (index == 3) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also add assertion for index = 2. I assume "number" value should be 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll add it later
nsivabalan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes in general is good. just 1 nit. Can we try to see if we can leverage HoodieTestDataGenerator to write tests.
you can do something like this
dataGen.generateInserts : batch1
dataGen.generateUpdates : batch2
if not for allowDuplicates config set, total record count in snapshot view will be just batch1 records (assuming we updated all records from batch1)
if config is set to true, total record count in snapshot view will be batch1 records + batch2 records. and you can do some assertions to ensure both batch of records are present.
OK, I'll try |
|
since you have already put in time to write some tests, let already existing tests be there. may be in addition, give it a try. |
Tips
What is the purpose of the pull request
Add support allowDuplicateInserts in HoodieJavaClient
Brief change log
(for example:)
Verify this pull request
This change added tests and can be verified as follows:
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.